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1. INTRODUCTION 

This report presents some preliminary results on the data compression 
simulation program prepared by TRW for the NASA contract "ERTS Image 
Data Compression Technique Evaluation." This report is intended to 
illustrate the typical computer output for each scene processed. 

2. BACKGROUND 


As specified in the TRW proposal, the computer program should be 


capable of generating various statistical characterizations of the MSS 

data and of the compression algorithms. These measures are: 

© Data mean and variance in each spectral band and over all bands. 

& First difference probability density functions (pdf) for each 
spectral fcand using the SSDI , SSD1A, and SSOJAM transforms. 


s Joint spectral-spatial coirelation along the scan lines. 
& First difference joint probability ellipsoids. 
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a Overall pdf for SSDI, SSDIA, SSDIAM and Shell symbols. 

© Huffman codes for SSDI, SSDIA, SSDIAM, and Shell symbols. 

© Scene entropy and average code lengths for SSDI, SSDIA, SSDIAM, 
and Shell transforms. 

o Time-varying data compression for the scene. 

These statistical measures will be computed for several 5 nmi.x 5 nmi 
object classes and 25 nmi x 25 nmi scenes to be extracted from MSS tapes. 
In addition, compressed tapes will be generated and reconstructed tapes 
will be made for a few selected scenes. ; 

3. RELEVANCE TO FUTURE WORK 

The MSS tapes processed during the data analysis phase of the contract 
will produce similar computer output unless changes are made as a result 
of the Data Analysis Plan due 26 January 1973. As a result of the 
preliminary analysis several additional statistical measures such as 
line-to-line and col umn-to-colurnn correlation as well as covariance 
matrices would be desirable, at least for selected scenes. 

Using ERTS-A tape I025-1S103, several subscenes have been processed 
with the TRW computer program. These scenes corresponded to segments 
of the image having varying degrees of data activity. The compressions 
achieved on these 5 nmi x 5 nmi subscenes produce average output bit 
rates varying from 1.8 bits per sample to 4.4 bits/per sample. The 
example case given in section 4 is an intermediate case requiring an 
output rate on the order of 3 bits per sample. 



4. COMPUTER SIMULATION RESULTS 

The master computer programs have been used to simulate the various 
compression algorithms and to compute the desired statistics on several 
segments of ERTS-A mul ti spectral data. These scenes have been taken 
from the ERTS digital bulk MSS tape number 1025-15103 which covers the 
Lake St. John area in Quebec, Canada and includes the cities of Alma and 
Chicoutioni and the Saguenay River. The area is shown in Figure 4.1. 

The various printout generated by the CDC-6500 computer is included 
in this section. The results will first be given and then the inter- 
pretation of these results will be presented and compared to results 
obtained on other sections of the scene. The output shown is from a 
5 nmi x 5 nmi high detailed section of the scene centered at 49.2° and 
71 .l°Ionaitude. 

Figure 4.2 shows the cross spectral -spatial correlation of 
the input data as a function of distance along a scan line. This data 
is plotted in Figure 4.3. The correlation is formed by obtaining the 
normalized dot products of the intensity vectors I. and I^ + ^, corresponding 
to pairs of intensity vectors separated along the scan line by k-1 
intervening pixels. Normalization removes the effects of scene illumina- 
tion and the closer the dot product is to unity (100%), the higher the 
correlation between intensity vectors. If the data is very active 
spectrally, the pair of vectors can be widely separated. The curves 
correspond to the percentages of vectors a distance k apart having 
normed dot products greater than as averaged over the entire scene. 
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Figure 4.2 Cross Spectral-Spatial Correlation of the Data 
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The joint probability distribution function (pdf) of the first order 
differences (as obtained by the SSDI algorithm) over the scene is given 
in Figures 4.4 through 4.9. The set of output products spans all six 
possible pairs of bands. In each plot the joint occurrence of (0, 0) 
is normalized to 100 and this normalizing factor is used to multiply all I 
other joint output occurrences. If a pair occurs less than one percent 
of the occurrence of (0, 0) it is not displayed in order to simplify 
the figures. 

Figure 4.10 shows the mean and variance of each band as well as 
the overall mean and variance of the scene. Figure 4.11 gives the 
pdf of the first differences as obtained by the SSDI algorithm. Only 
difference levels in the range [-18, 18] are given since levels beyond 
these normally occur far less than one per cent of the time. Figures 4.12 
and 4.13 give the first difference pdf as obtained by the SSDIA and 
SSDI AM algorithms. Note that the SSDIA first differences have a much 
smaller variance than those obtained by the SSDI and the one percent 
occurence cuts off at a lower level. The SSDIAM further decreases the 
variance but increases the probability of +1 and -1 due to the one bit 
mapping of this algorithm. Again, the SSDIAM produces a one percent 
cutoff at a lower difference level than the SSDIA. In general, compression 
increases as the variance decreases. 

The pdf of the SSDI, SSDIA, and SSDIAM symbols are given in Figure 4.14. 
As in Figures 4.11 through 4.13, an improvement can be observed in the 
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Figure 4.5 JOINT PROBABILITY ELLIPSE - BANp 1 .VS. BAND 
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Figure 4.16 Shell Huffman Code 
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distribution as we progress from SSDI to SSDIAM, implying increasing 

compression. Figure 4.15 shows the probability of shell locations for 

the given scene. Level 1 implies that all SSDI symbols are simultaneously 

zero for a pixel. Level 2 implies that the greatest symbol magnitude 

2 

is 1 for a pixel. The probability distribution resembles a x distribution 
with peak at level 2 and a slow fall off of the tail. 

Figures 4.16 through 4.19 give the Huffman codes for the scene 
using SHELL, SSDI, SSDIA, and SSDIAM transforms. Parameters listed are 
the symbol or shell level, its probability, the length of the symbol 
code word in bits, and the actual binary code word assigned to the symbol. 
To conserve space, only symbol levels between -20 and +20 are given but 
code words are assigned to all symbols. The least probable symbols 
can be grouped together under a lumped Huffman prefix code word as 
described in Appendix B. Such is the case for the SSDI, SSDIA, and 
the SSDIAM codes. All grouped code words displayed are given an 
asterisk following the symbol length and the lumped prefix code is given. . 
Following the Huffman code, the total probability of the grouped symbols 
is given as well as the prefix code length in bits. The entropy of 
the symbol distribution is also displayed. 



Figure 4.20 shows several time varying statistics of the coding 
techniques. The buffering statistics and the average bits per sample 
are given for each scan line of 180 pixels in each spectral band. 

Figure 4.20 only gives the first 119 scan lines of data using SSDIA 
symbols. If desired, the statistics can be presented for the SSDI 
or the SSDIAM symbols. 

Figure 4.20 permits a comparison of the line by line average bit rate 
for the global Huffman, the adaptive Huffman, and the Rice encoding 
algorithms. For the data used, the average number of bits per sample 
varies from 2.214 bits to 3.905 bits for global Huffman coding, from 
2.105 bits to 3.912 bits for adaptive Huffman coding, and from 2.249 bits 
to 4.321 bits for Rice encoding. In general, the average bits/sample 
varies rather slowly from line to line, following trends in the source 
data activity. Figure 4.21, presents the average data compression achieved 
over the scene by the adaptive coding techniques. The adaptive Huffman 
coding achieves a lower average bit rate than the Rice coding because 
of the necessary overhead which must be transmitted with Rice encoding. 

In addition, statistics are given concerning the occurrence of the various 
Rice modes. For the given data, the fundamental sequence (FS) was trans- 
mitted 31.7 percent of the time, the coded fundamental sequence (F5C) was 
transmitted 57.5 percent of the time, and the complemented fundamental 
sequence (FSCB) was used 10.8 percent of the time. For this data the 
split-pixel modes (6, 1), (4, 3) and (3, 4) did not occur. 
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A sample of the output table from program BLDTAB is given in 

1 2 

Figure 4.24 This decoding table is of length 2 and each entry gives 
the appropriate SSDI symbol and the number of shifts required to reposi 
tion the compressed bit stream for the next decoding operation. The 
beginning segment of the table gives symbols included under the lumped 
prefix. This lumped prefix has four bits. The following eight bits 
separates the lumped symbols. As shown the overall code word is of 
length twelve bits so that twelve shifts would be required in decoding. 
The other decodable words in this segment of the decoding table are -2 
and +2, each of length four bits. 

Figures 4.22 and 4.23 show the same segment of data from the 
input scene. Figure 4.22 gives the source digital data values in 
the first spectral band and Figure 4.23 gives the reconstructed digital 
data for the same band. Since the simulation is strictly information 
preserving, no errors have occurred in the data. 
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Figure 4.22 A Segment of Input Data from Scene (Band 2 ) 
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Figure 4.23 A Segmeht of "Reconstructed 7>ata_ (Panel 2) 
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Figure 4.24 A Segment of Table ITAB 
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APPENDIX B: HUFFMAN SOURCE CODING 


Several algorithms exist for efficiently coding sources whose statistics 
are known. These techniques have been investigated at TRW and the Huffman 
code was chosen as being the most desirable algorithm for ground processing. 

The Huffman code has all the properties required to ensure unique decoding 
with the minimum number of bits that can be obtained, coding each symbol at 
a time, and permits use of a "table look-up" decoding algorithm which can be 
performed rapidly. 

A difficulty encountered in practical applications is the cumbersome 
algorithm required for the classical synthesis of a Huffman code given the 
statistics of the source symbols S. TRW has developed a more efficient 
technique for generation of Huffman codes. This algorithm also permits group- 
ing of low probability symbols together for simplified decoding. Following a 
discussion of the classical Huffman code synthesis, the new algorithm will be 
described. 

The Class i cal Synthesis of Huffman Co des ^ 

Consider the source S with symbols S, , $ 9 , . . . , and symbol probabili- 

q \ c q 

ties P 1 , P 9 ,..., P and z P. = 1. Let the symbols be ordered so that 

' -j ~ ] 

2k P 2 - ' ’ ‘ - P q' regarding the last two symbols of S as combined into 
one symbol, we obtain a new source from S containing only q-1 symbols. This 
new source is called a reduction of S. The symbols of this reduction of S may 
be re-ordered again in terms of decreasing probability and again the two least 
probable symbols of the reduced S are combined to form a second reduction. By 
continuing this reduction process, a sequence of sources is formed, each con- 
taining one less symbol than the previous reduction. The process is finished 
when a reduced source contains only two symbols. 

A compact instantaneous binary code for the final reduction is the trivial 
code with words 0 and 1. Working backward from this final reduction, the 
Huffman code is synthesized as follows. Assume that a compact instantaneous 
code has been found for , one of the sources in a sequence of reduced sources. 
One of the symbols of S., say S , is formed from two symbols of the preceding 



source S^. Call these symbols S^g and . Each of the other symbols of 
S.j corresponds to one of the remaining symbols of S^-j. The compact in- 
stantaneous code for S._^ is formed from the code derived for as follows: 

Assign to each symbol of (except S ^ and S i ) the codeword used by 
the corresponding symbol of . The codewords used by S^g and S ^ are formed 
by adding a 0 and 1, respectively, to the codeword used for S . An example 
of the synthesis procedure for a given source is illustrated in Figure B1 . 
Each symbol of the source S is assigned a codeword of length £.j . The 
average code length for this source is therefore 






where L satisfies the inequality 


q 

0 < L i H = - z 
i=l 


P. 


log 2 P, 


where H is the entropy of the source S. 

The difficulty imposed by the classical Huffman synthesis involves the 
forward flow of the code generation between successive reduced sources. This 
procedure is very inefficient of time and storage when used as the basis of a 
computer algorithm for coding a source. 

An Irproved Huffman Algorithm £or Computers 

The new algorithm separates the source reductions from the code synthesis. 
The first part of the algorithm keeps track of the number of times each symbol 
in the original source is grouped during the sequence of source reductions. 

This contains all information as to the length of the codeword assigned to that 
symbol in the resulting Huffman code. The second part of the algorithm uses 
these lengths, £., to generate a Huffman code C for the source S. 
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Note that the resulting Huffman code may or may not be identical to the 
code generated by the classical synthesis procedure, but the average code 
length is identical. Using the classical technique, many different Huffman 
codes can also be generated, depending on the assignment of 0 and 1 in each 
reduced source. 

An example of the determination of codeword lengths, , is given in 
Figure B2 for the same source used in Figure B1 . The second part of the 
algorithm is illustrated in Figure B3. This part of the algorithm operates 
as fol lows: 

1. The lengths of are ranked in the order of increasing length. 

2. Symbol S k of minimum length, s,^, is assigned zeros. 

3. Each successive symbol S m has a code formed as 


C - ( C i + 1 ) + 

m m-1 ' 


-I ) zeros, 
v m m-1 ' 


This algorithm is very fast and essentially separates the problem of code 
generation from that of source reduction. The only information which need 
be stored from the source reduction portion of the algorithm is the vector of 
code lengths. 

Low Probability Symbol Grouping 

Often the total number of symbols S^ in source S is quite large and many 

of these symbols have probabilities of a small fraction of one percent. To 

save time in the encoding/decoding process at the expense of a small increase 

in average code length, these low probability symbols can be lumped into a 

single symbol. As an example, after ordering symbols with decreasing probability 

of occurrence, the first J symbols are directly coded, where e >_ .99. 

The remaining symbols, having a total probability P - + ^' of one 1 percent or less, 

or grouped into symbol S. , , . If M symbols are lumped into S. , , , R bits must be 

J +1 ( ^ * J + l 

used to describe these M symbols, where R = jlog^ M > . During transmission, 
codeword Cj + -j is followed by R bits to describe which of the M symbols occurred. 
The average code length is lengthened by such a grouping by less than R. 



means next larger integer. 


V 



The advantage of grouping symbols whi ch seldom occur is that the maximum 
length of any code word can be held to some predetermined length N. This 
simplifies the decoding algorithm and keeps the length of the required look-up 
table to length 2^. These advantages in decoding are obtained at the possible 
expense of a slightly increased average code length. 

During the encoding of symbols S, whenever one of the symbols occurs 
which is in the grouping the compressor transmits the sequence of bits forming 
code word Cj^, followed by R bits to describe which grouped symbol occurred. 
When the decoder encounters code word C J+1 , it uses the next R bits to decode 
this grouped symbol. 



COMPUTER PROGRAM 


A computer program has been developed and tested which accepts an array 
of symbols and generates the Huffman code. The program allows the operator 
to group symbols if desired and generates the grouped Huffman code and the 
average bit rate if R bits are used to separate the lumped symbols. 

The flowchart describing the program is given in Figure B4. The inputs 
required are the source symbols S, their associated probabilities P, and the 
maximum codeword length acceptable N. The program outputs the Huffman coded 
Table MUF, which contains the coded bit stream C associated with the source 
symbols S. 

Two major subroutines are used in this program. Subroutine ORDER re- 
orders the symbols and their probabilities in a decreasing order so that the most 
probable symbols are at the top of an array 0. Subroutine GROUP adds the two 
least probable symbols in the array 0 to form a source reduction. This sub- 
routine also keeps count of the number of source reductions performed and 
keeps track of the original source symbols which have been combined to form 
each reduced symbol. Each symbol is given a bit position in an array V. If 
symbols S-| , S^ and Sg have been combined in a source reduction, that reduced 
symbol is represented in V as the binary word (• • • *10101). This re- 
presentation allows a compact designation of groupings at each stage in the 
reduction. 

In operation, the program takes the array of input symbols and their 
probabilities, calls ORDER to rank them, and combines the M least probable 
symbols to form the grouped symbol of probability Pj_ M+ -] = £ P^ 

(assuming P-j >_ ... >_ Pj_-| >_ P j ) . This new set of J-M+'i symbols forms the 

input to the basic algorithm in which successive calls to subroutines GROUP 
and ORDER generate successive source reductions until only two reduced symbols 
remain. At each stage of the reduction, array LENGTH is updated by one for 
each symbol in S which has been combined to form one of the reduced symbols 
which have been grouped in that step. 
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Following the reduction process, the array LENGTH is used to compute the 
binary codeword associated with all of the J-M+1 non-grouped source symbols. 
LENGTH is re-ordered so that the most probable symbols which have the shortest 
code lengths are at the top of the array. A test takes place after LENGTH is 
re-ordered. If the longest codeword exceeds N bits, more source symbols are 
grouped and the source reductions performed again until the maximum codeword 
length is M or less. With 256 source symbols, such an occurrence is guaranteed 
at some stage of grouping. 

, The generation of the codes then begins with the minimum length codeword 
and proceeds from word to word with the successive steps of adding 1 to the 
previous codeword and adding the required number of zeros to fill the word. 

Table HUF is then generated where all entries corresponding to non-grouped 
symbols contain the computed Huffman codeword. For all grouped symbols, the 
entry in HUF contains the lumped prefix codeword Cj_ M+] followed by 8 bits 
giving the symbol directly. 

This program has been written by TRW and tested using SSDI encoding of 
subscenas from the MSS tape ERTS E-l 025-1 51 03. 
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(b) Code Synthesis 


Figure 81 : Classical Huffman Code Synthesis 





Figure B2: Determination of Code Word Lengths, l 
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